35 research outputs found

    Adaptive Critic Designs

    Get PDF
    We discuss a variety of adaptive critic designs (ACDs) for neurocontrol. These are suitable for learning in noisy, nonlinear, and nonstationary environments. They have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Our discussion of these origins leads to an explanation of three design families: heuristic dynamic programming, dual heuristic programming, and globalized dual heuristic programming (GDHP). The main emphasis is on DHP and GDHP as advanced ACDs. We suggest two new modifications of the original GDHP design that are currently the only working implementations of GDHP. They promise to be useful for many engineering applications in the areas of optimization and optimal control. Based on one of these modifications, we present a unified approach to all ACDs. This leads to a generalized training procedure for ACD

    Approximation with Random Bases: Pro et Contra

    Full text link
    In this work we discuss the problem of selecting suitable approximators from families of parameterized elementary functions that are known to be dense in a Hilbert space of functions. We consider and analyze published procedures, both randomized and deterministic, for selecting elements from these families that have been shown to ensure the rate of convergence in L2L_2 norm of order O(1/N)O(1/N), where NN is the number of elements. We show that both randomized and deterministic procedures are successful if additional information about the families of functions to be approximated is provided. In the absence of such additional information one may observe exponential growth of the number of terms needed to approximate the function and/or extreme sensitivity of the outcome of the approximation to parameters. Implications of our analysis for applications of neural networks in modeling and control are illustrated with examples.Comment: arXiv admin note: text overlap with arXiv:0905.067

    Training Winner-Take-All Simultaneous Recurrent Neural Networks

    Get PDF
    The winner-take-all (WTA) network is useful in database management, very large scale integration (VLSI) design, and digital processing. The synthesis procedure of WTA on single-layer fully connected architecture with sigmoid transfer function is still not fully explored. We discuss the use of simultaneous recurrent networks (SRNs) trained by Kalman filter algorithms for the task of finding the maximum among N numbers. The simulation demonstrates the effectiveness of our training approach under conditions of a shared-weight SRN architecture. A more general SRN also succeeds in solving a real classification application on car engine data

    Conservative Thirty Calendar Day Stock Prediction Using a Probabilistic Neural Network

    Get PDF
    We describe a system that predicts significant short-term price movement in a single stock utilizing conservative strategies. We use preprocessing techniques, then train a probabilistic neural network to predict only price gains large enough to create a significant profit opportunity. Our primary objective is to limit false predictions (known in the pattern recognition literature as false alarms). False alarms are more significant than missed opportunities, because false alarms acted upon lead to losses. We can achieve false alarm rates as low as 5.7% with the correct system design and parameterization

    Adaptive Critic Design in Learning to Play Game of Go

    Get PDF
    This paper examines the performance of an HDP-type adaptive critic design (ACD) of the game Go. The game Go is an ideal problem domain for exploring machine learning; it has simple rules but requires complex strategies to play well. All current commercial Go programs are knowledge based implementations; they utilize input feature and pattern matching along with minimax type search techniques. But the extremely high branching factor puts a limit on their capabilities, and they are very weak compared to the relative strengths of other game programs like chess. In this paper, the Go-playing ACD consists of a critic network and an action network. The HDP type critic network learns to predict the cumulative utility function of the current board position from training games, and, the action network chooses a next move which maximizes critics next step cost-to-go. After about 6000 different training games against a public domain program, WALLY, the network (playing WHITE) began to win in some of the games and showed slow but steady improvements on test game

    Neurocontroller Alternatives for Fuzzy Ball-and-Beam Systems with Nonuniform Nonlinear Friction

    Get PDF
    The ball-and-beam problem is a benchmark for testing control algorithms. Zadeh proposed (1994) a twist to the problem, which, he suggested, would require a fuzzy logic controller. This experiment uses a beam, partially covered with a sticky substance, increasing the difficulty of predicting the ball\u27s motion. We complicated this problem even more by not using any information concerning the ball\u27s velocity. Although it is common to use the first differences of the ball\u27s consecutive positions as a measure of velocity and explicit input to the controller, we preferred to exploit recurrent neural networks, inputting only consecutive positions instead. We have used truncated backpropagation through time with the node-decoupled extended Kalman filter (NDEKF) algorithm to update the weights in the networks. Our best neurocontroller uses a form of approximate dynamic programming called an adaptive critic design. A hierarchy of such designs exists. Our system uses dual heuristic programming (DHP), an upper-level design. To our best knowledge, our results are the first use of DHP to control a physical system. It is also the first system we know of to respond to Zadeh\u27s challenge. We do not claim this neural network control algorithm is the best approach to this problem, nor do we claim it is better than a fuzzy controller. It is instead a contribution to the scientific dialogue about the boundary between the two overlapping disciplines

    Comparative Study of Stock Trend Prediction using Time Delay, Recurrent and Probabilistic Neural Networks

    Get PDF
    Three networks are compared for low false alarm stock trend predictions. Short-term trends, particularly attractive for neural network analysis, can be used profitably in scenarios such as option trading, but only with significant risk. Therefore, we focus on limiting false alarms, which improves the risk/reward ratio by preventing losses. To predict stock trends, we exploit time delay, recurrent, and probabilistic neural networks (TDNN, RNN, and PNN, respectively), utilizing conjugate gradient and multistream extended Kalman filter training for TDNN and RNN. We also discuss different predictability analysis techniques and perform an analysis of predictability based on a history of daily closing price. Our results indicate that all the networks are feasible, the primary preference being one of convenienc

    Training Recurrent Neurocontrollers for Real-Time Applications

    No full text

    Advanced Adaptive Critic Designs

    No full text
    We present a unified approach to a family of Adaptive Critic Designs (ACDs). ACDs approximate dynamic programming for optimal control and decision making in noisy, nonlinear, or nonstationary environments. This family consists of Heuristic Dynamic Programming (HDP), Dual Heuristic Programming (DHP), and Globalized Dual Heuristic Programming (GDHP), as well as their Action-Dependent forms (the prefix AD denotes these)[1]. The most powerful of these designs reported previously is GDHP [2,3]. After pointing out problems of the simple ACDs, we describe advanced ACDs and introduce ADGDHP. We also propose a general training procedure for ACDs and discuss some important research issues

    Convergence Of Critic-Based Training

    No full text
    This paper discusses convergence issues when training adaptive critic designs (ACD) to control dynamic systems expressed as Markov sequences. We critically review two published convergence results of critic-based training and propose to shift emphasis towards more practically valuable convergence proofs. We show a possible way to prove convergence of ACD training. 1. INTRODUCTION We study ACD with neural networks in the domain of Markov sequences. A family of ACD exists, and it is extensively described in [1, 2]. Most significant difference among ACD lies in a type of critic they use. Simplest ACD, e.g. Q-learning, employ J critic or function that evaluates long-term performance of the closed-loop system. (J is a common designator for the cost-to-go in dynamic programming, hence the critic's notation.) In contrast, advanced ACD use derivative critics, i.e. critics outputting derivatives of J with respect to states of the system. Convergence of ACD training is both important and multi..
    corecore